Single pass graph sparsification in distributed stream processing
نویسندگان
چکیده
We give a distributed one pass streaming algorithm for graph sparsification. Besides producing a sparsifier, our algorithm maintains a hierarchy of UNION-FIND data structures in a distributed manner that efficiently support queries of strong connectivities between pairs of vertices. An important component of the algorithm is an implementation of UNION-FIND queries over an Active Distributed Hash Table that guarantees good load balancing properties. This is achieved via a single step of what is known in the literature as the zig-zag heuristic. We provide theoretical guarantees for the load balancing achieved by this heuristic, and show how the structure of our sparsification scheme ensures good load balancing across the hierarchy of UNION-FIND data structures maintained by the algorithm. We also present simulation results on synthetic as well as real world data verifying the load balancing properties and the quality of approximation of strong connectivities achieved by the algorithm.
منابع مشابه
Single pass sparsification in the streaming model with edge deletions
In this paper we give a construction of cut sparsifiers of Benczúr and Karger in the dynamic streaming setting in a single pass over the data stream. Previous constructions either required multiple passes or were unable to handle edge deletions. We use Õ(1/ǫ) time for each stream update and Õ(n/ǫ) time to construct a sparsifier. Our ǫ-sparsifiers have O(n log n/ǫ) edges. The main tools behind o...
متن کاملSparsification Algorithm for Cut Problems on Semi-streaming Model
The emergence of social networks and other interaction networks have brought to fore the questions of processing massive graphs. The (semi) streaming model, where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges) is an useful and efficient model for processing large graphs. In many of these graphs the numbers of vertices are significantly less t...
متن کاملGraph Sparsification in the Semi-streaming Model
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of ...
متن کاملScalable Linked Data Stream Processing via Network-Aware Workload Scheduling
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...
متن کاملNetwork-Aware Workload Scheduling for Scalable Linked Data Stream Processing
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy—a uniform distribu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011